Data Visualization Methods, Data Visualization Devices, Data Visualization Apparatuses, and Articles of Manufacture

ABSTRACT

Data visualization methods, data visualization devices, data visualization apparatuses, and articles of manufacture are described according to some aspects. In one aspect, a data visualization method includes accessing a plurality of initial documents at a first moment in time, first processing the initial documents providing processed initial documents, first identifying a plurality of first associations of the initial documents using the processed initial documents, generating a first visualization depicting the first associations, accessing a plurality of additional documents at a second moment in time after the first moment in time, second processing the additional documents providing processed additional documents, second identifying a plurality of second associations of the additional documents and at least some of the initial documents, wherein the second identifying comprises identifying using the processed initial documents and the processed additional documents, and generating a second visualization depicting the second associations.

RELATED PATENT DATA

This application is a divisional of and claims priority to U.S. patentapplication Ser. No. 11/256,225 filed Oct. 21, 2005, the teachings ofwhich are incorporated herein by reference.

GOVERNMENT RIGHTS STATEMENT

This invention was made with Government support under ContractDE-AC0676RLO1830 awarded by the U.S. Department of Energy. TheGovernment has certain rights in the invention.

TECHNICAL FIELD

This invention relates to data visualization methods, data visualizationdevices, data visualization apparatuses, and articles of manufacture.

BACKGROUND

Text analysis tools are gaining popularity in use by analysts. Many textanalysis tools operate on a fixed set of data which may be appropriatein a number of applications such as common evaluation or duplication ofresults. However, analyzing fixed sets of data can lead to a focus onfixed “bucket of data” approaches where as a user may utilize profilesor standing queries that constantly reflect the latest information atdifferent moments in time.

A user may benefit from a visual analysis system which allows them toadd new documents to an ongoing exploration. However, if thevisualization is computed every time, an analyst may lose the contextand exploration results stored from previous work. Further, analysts maynot be able to compare differences between visualizations if they exitthe visualization before new computations take place.

As described below, at least some aspects of the disclosure provideimproved data visualization methods and apparatus.

BRIEF DESCRIPTION OF THE DRAWINGS

Preferred embodiments of the invention are described below withreference to the following accompanying drawings.

FIG. 1 is a block diagram of a data visualization apparatus according toone embodiment.

FIG. 2 is a screen display illustrating an exemplary visualization of aplurality of associations of a plurality of documents at a given momentin time according to one embodiment.

FIG. 3 is an illustrative representation of dynamic association ofdocuments and sequence numbers at a plurality of moments in timeaccording to one embodiment.

FIG. 4 is an illustrative representation of an example forward indexindicating the words present in respective documents according to oneembodiment.

FIG. 5 is an illustrative representation of a reverse index indicatingthe documents in which given words are present according to oneembodiment.

FIG. 6 is a screen display illustrating information regarding documentsarranged by date/time of publication compared to receipt of thedocuments according to one embodiment.

FIGS. 7A and 7B are user interfaces which may be used to controloperations of the data visualization apparatus according to oneembodiment.

DETAILED DESCRIPTION

At least some aspects of the disclosure provide methods and apparatusfor processing text contained in a plurality of documents and generatingvisualizations resulting from the processing. Exemplary processing maycreate associations of documents with one another. For example, in oneembodiment, the analysis may generate a plurality of clusters ofdocuments wherein documents of a given cluster may be considered to beassociated with one another (e.g., related to common topic) in oneembodiment. Labels may also be provided and which are used to identifyclusters to assist a user with analysis of the documents.

A document may refer to a communication comprising a plurality of textwords. Some examples of documents which may be processed and analyzedinclude publications (e.g., newspaper articles, magazine articles,books), word processor files, e-mails, chat room communications, speechtranscriptions, etc.

At least some aspects of the disclosure analyze documents which maybecome dynamically available to the apparatus, for example, bypublication, creation, interception or other means. The analysis isperformed at a plurality of moments in time using documents which arepresent for processing and visualization. In one embodiment, thevisualizations may correspond to documents which are received in aconstantly moving window of time. For example, the processing circuitrymay generate visualizations using documents which have been receivedwithin a fixed period of time relative to the present moment in time.Documents which have been present for a predetermined amount of time maybe aged-off as outside of the moving window of interest while newlyreceived documents are added. Accordingly, the visualizations may beupdated at a plurality of moments in time corresponding to processing ofdocuments within the window at different moments in time as timeprogresses. Resultant analyses of the documents may be displayed upon acomputer screen for a user at a plurality of moments in time as newdocuments are made available and/or aged documents are discarded in oneembodiment. Some aspects permit a user to pause dynamic updates of avisual representation of the analysis if a given representation is ofinterest to the user. Additional aspects are described with respect toexemplary illustrative embodiments.

Referring to FIG. 1, an exemplary data visualization apparatus accordingto one embodiment is illustrated with respect to reference numeral 10.In the depicted embodiment, data visualization apparatus 10 isimplemented as a computing device, such as a work station or personalcomputer, and may include a communications interface 12, processingcircuitry 14, storage circuitry 16, and a user interface 18. Otherembodiments of apparatus 10 may include more, less and/or alternativecomponents.

Communications interface 12 is arranged to implement communications ofapparatus 10 with respect to a network, external devices, etc. (notshown). For example, communications interface 12 may be arranged tocommunicate information bi-directionally with respect to apparatus 10.Communications interface 12 may be implemented as a network interfacecard (NIC), serial or parallel connection, USB port, Firewire interface,flash memory interface, floppy disk drive, or any other suitablearrangement for communicating with respect to apparatus 10.

In one embodiment, communications interface 12 is configured todynamically receive and access documents for processing by apparatus 10.For example, communications interface 12 may be coupled with anyappropriate source of documents, including for example static or dynamicdatabases, news feeds, email interceptors, etc. The source maydynamically provide documents to apparatus 10 as the documents arepublished, captured or otherwise made available.

In one embodiment, processing circuitry 14 is arranged to process data,control data access and storage, issue commands, and control otherdesired operations. Processing circuitry 14 may operate to accessdocuments which are received by communications interface 12, to identifyassociations of the documents and to generate visualizations of theassociations. Processing circuitry 14 may dynamically access documentswhich are made available on an ongoing basis and update thevisualizations using the newly received documents in one embodiment. Asmentioned above, documents may also be removed from the associations andthe visualizations after they have aged a certain amount of time fromtheir reception by apparatus 10, creation, publication, or othercriteria according to an additional embodiment. Additional detailsregarding processing and generation of visualizations are describedbelow according to exemplary embodiments.

Processing circuitry 14 may comprise circuitry configured to implementdesired programming provided by appropriate media in at least oneembodiment. For example, the processing circuitry 14 may be implementedas one or more of a processor and/or other structure configured toexecute executable instructions including, for example, software and/orfirmware instructions, and/or hardware circuitry. Exemplary embodimentsof processing circuitry 14 include hardware logic, PGA, FPGA, ASIC,state machines, and/or other structures alone or in combination with aprocessor. These examples of processing circuitry 14 are forillustration and other configurations are possible.

Storage circuitry 16 is configured to store programming such asexecutable code or instructions (e.g., software and/or firmware),electronic data, databases, or other digital information and may includeprocessor-usable media. Exemplary programming may include programmingconfigured to cause apparatus 10 to process, analyze and displayinformation regarding a dynamically changing collection of documents.Processor-usable media includes any computer program product or articleof manufacture which can contain, store, or maintain programming, dataand/or digital information for use by or in connection with aninstruction execution system including processing circuitry in theexemplary embodiment. For example, exemplary processor-usable media mayinclude any one of physical media such as electronic, magnetic, optical,electromagnetic, infrared or semiconductor media. Some more specificexamples of processor-usable media include, but are not limited to, aportable magnetic computer diskette, such as a floppy diskette, zipdisk, hard drive, random access memory, read only memory, flash memory,cache memory, and/or other configurations capable of storingprogramming, data, or other digital information.

At least some embodiments or aspects described herein may be implementedusing programming stored within appropriate storage circuitry describedabove and/or communicated via a network or using other transmissionmedium and configured to control appropriate processing circuitry. Forexample, programming may be provided via appropriate media including forexample articles of manufacture.

User interface 18 is configured to interact with a user includingconveying data to a user (e.g., displaying data for observation by theuser, audibly communicating data to a user, etc.) as well as receivinginputs from the user (e.g., tactile input, voice instruction, etc.).Accordingly, in one exemplary embodiment, the user interface 18 mayinclude a display 20 (e.g., cathode ray tube, LCD, etc.) configured todepict visual information as well as a keyboard, mouse and/or otherinput device 22. Any other suitable apparatus for interacting with auser may also be utilized.

The above-described embodiment comprises an integrated unit configuredto process documents and display visualizations of the associations ofthe documents and related information for observation by a user. Otherconfigurations are possible wherein apparatus 10 is configured as anetworked server configured to process documents and generate files forcreating visualizations. One or more clients (not shown) may usedisplays of respective terminals configured to access the files forcreating the visualizations for observation by one or more user. Otherconfigurations of apparatus 10 are possible.

Referring to FIG. 2, an exemplary screen display 30 depicted by display20 and comprising a visualization of associations of documents at amoment in time is shown. Screen display 30 shows one possible examplefor depicting results of processing a set of documents at a moment intime. According to one implementation, data visualization apparatus 10may be configured to implement SPIRE or IN-SPIRE™ visual analyticssystems available from the Pacific Northwest National Laboratory athttp://in-spire.pnl.gov and described for example in U.S. Pat. Nos.4,839,853, 6,298,174, 6,484,168, 6,584,220, 6,772,170, the teachings ofwhich are incorporated herein by reference. Other arrangements fordepicting the results of document processing may be provided in otherembodiments. For example, as mentioned below, screen display 30illustrates clusters of associations of documents which may beincrementally updated. Other formats are possible for depictingassociations of documents and which may be incrementally updated, forexample, including a landscape metaphor and/or a rectangular metaphor.

In the illustrated screen display 30, a plurality of documents arerepresented by respective dots 32 which may be arranged in a pluralityof clusters 34. Documents which are associated with one another as aresult of the processing by apparatus 10 may be arranged in one of theclusters 34. Additionally, the processing circuitry 14 may determine andassociate a plurality of labels 36 with the clusters 34 and which aregenerally indicative of content or subject matter of the documents whichare associated with the cluster 34. A user may interact via userinterface 20 with the visualization of the screen display 30. In oneexample, a user may select a dot 32 of interest and the selection mayprovide additional details, such as the title, author, publication date,contents, etc. of the respective document represented by the selecteddot.

As mentioned above, data visualization apparatus 10 is configured in oneembodiment to access and process a dynamically changing set ofdocuments, and accordingly, the screen display 30 may change over timeto reflect changes in the corpus of documents being analyzed atdifferent moments in time. In addition, in one embodiment, informationregarding dynamic changes to a collection of documents may be depictedfor a user via the screen display 30. For example, documents which arereceived and processed relatively recently by apparatus 10 may bedisplayed as dots 32 having a different color than other dots and aftera period of time (e.g., 10 minutes), the color may be changed to thecolor of the other dots 32.

The above-mentioned IN-SPIRE™ data analytics system operated upon staticdata sets. For example, a document corpus containing a given number ofdocuments is accessed and screen display 30 may be generated followingthe processing. At least some aspects of the disclosure describe methodsand apparatus for processing and displaying associations of documents(e.g., using IN-SPIRE) which may be dynamically received and/or aged off(or otherwise added or removed from a set of documents being analyzed)at a plurality of moments in time. One embodiment of the disclosurereduces an amount of time used by the apparatus 10 for processing adynamic collection of documents. In one embodiment, results of previousprocessing of documents may be maintained and used for subsequentassociations with newly received documents.

Some embodiments describe processing of documents using incrementalindexing schemes to facilitate the identification of documents andassociations of documents of dynamically changing data sets. Indexes maybe generated and used by processing circuitry 14 to determineassociations of documents during processing of the documents. Anexemplary incremental indexing scheme may be incrementally updated atdifferent moments in time, for example, corresponding to the timing ofreception of new documents by apparatus 10 in but one operationalembodiment. Increments may refer to status of visualizations andassociations of documents at different moments in time of the dynamiccollection of documents and based upon the documents present foranalyzing at the respective moments in time. At least some aspects ofdisclosure reduce the processing performed by processing circuitryinasmuch as indexes may be updated without having to reprocess documentswhich have already been processed. Additional details are described withrespect to exemplary embodiments below.

Additional aspects relate to aging off documents which have beenprocessed and displayed in visualizations of display 20. In one example,documents are time-stamped upon receipt by apparatus 10 and informationmay be obtained regarding a date/time of publication of the respectivedocuments. A threshold may be set (e.g., 1 hour, 1 day, etc.) whichspecifies when documents are aged off and removed from the system. Inone example, the processing circuitry 14 may analyze the documentspresent in the system with respect to the threshold and age-off (e.g.,remove) documents from the visualization, databases, and indices ofapparatus 10 described below. In one example, processing circuitry 14may perform the aging analysis at intervals corresponding to thedate/time of receipt of new documents by the apparatus 10 or thedate/time of publication. Intervals for performing the aging analysismay be based upon other criteria in other embodiments.

Referring to FIG. 3, dynamic associations for identifying documentswhich are received and aged off at different moments in time aredescribed according to one embodiment. In FIG. 3, a plurality ofsequential moments in time corresponding to plural increments areillustrated and progress from t1 (earliest) to t7 (latest). Associatedwith individual ones of the moments in time are a plurality of documents40 (e.g., files including the text of the documents) represented byletters, and a plurality of sequence numbers 42 which are associatedwith respective ones of the documents. The documents 40 and sequencenumbers 42 are arranged from left to right corresponding to the timingof arrival of the documents by apparatus 10. More specifically, in theexample of FIG. 3, the leftmost document and sequence number correspondsto the document which was first received by apparatus 10 while therightmost document and sequence number correspond to the document whichwas most recently received by apparatus 10.

Time t1 may correspond to an initial moment in time where documents A-Eare available for processing by apparatus 10. Processing circuitry 14may assign sequential sequence numbers 0-4 with respective ones of thedocuments A-E.

At time t2, no documents have aged off while new documents F, G havebeen received and processing circuitry 14 may assign subsequent sequencenumbers 5, 6 to documents F, G.

At time t3, documents A-E have aged off while new documents H-J havebeen received. Processing circuitry 14 may shift the association of thesequence numbers and the documents such that the oldest documentreceived by apparatus is sequence number 0. Accordingly, the documentsF, G which remain from time t2 are shifted to sequence numbers 0, 1 andnew documents H-J are assigned sequence numbers 2-4.

At time t4, documents F, G have aged off while new documents K, L havebeen received. Processing circuitry 14 may again shift the associationof the sequence numbers and the documents and the next subsequentsequence numbers are associated with the new documents as shown.

At time t5, no documents have aged off while new documents M, N havebeen received. Processing circuitry 14 may associate the next subsequentsequence numbers 5,6 with the new documents M, N as shown.

At time t6, documents H-L have aged off while new documents O-Q havebeen received. Processing circuitry 14 may again shift the associationof the sequence numbers and the documents and the next subsequentsequence numbers are associated with the new documents as shown.

At time t7, documents M, N have aged off while new documents R, S havebeen received. Processing circuitry 14 may again shift the associationof the sequence numbers and the documents and the next subsequentsequence numbers are associated with the new documents as shown.According to one embodiment, and at any moment in time, the exemplarydescribed usage of sequence numbers permits processing circuitry 14 toidentify desired files of documents for processing to generate the datavisualization such as FIG. 2 corresponding to the moment in time whenprocessing of the documents occurs.

According to one embodiment, processing circuitry 14 may maintain acumulative vocabulary list of features present in the documents beinganalyzed by apparatus 10 at a given moment in time. Features include anytypes of feature which may be measured in the documents. For example,features may include words, names, letter sequences, or phrases inillustrative examples. Although the following discussion including FIGS.4 and 5 proceeds with respect to processing using words, it is to beunderstood that other features may be analyzed in other embodiments.

Upon receipt within apparatus 10, the processing circuitry 14 performsprocessing of the documents including analyzing the words of thedocuments and adds the words present in the documents to a cumulativevocabulary list. Common words such as “the”, “or”, “and”, “a”, etc. maybe omitted from the vocabulary list.

The vocabulary list comprises a list of words (and/or other features)present within documents being visualized by apparatus 10 at a givenmoment in time. The list may also indicate the number of documents inwhich the respective words are present. Accordingly, if words present innew documents are not in the vocabulary list, processing circuitry 14may add the new words to the vocabulary list. If words present in thenew documents are already present in the vocabulary list, processingcircuitry 14 may increment the value indicating the number of documentsin which a word is present. In addition, the words of the vocabularylist may be associated with unique identifiers (e.g., word numbers)which may be thereafter used by apparatus 10 to numerically identify therespective words. The words may be arranged alphabetically at an initialmoment in time and numbered sequentially in one embodiment. New wordsmay be assigned subsequent ordered numbers as the new words are added inone embodiment.

When a document is aged off, the processing circuitry 14 may, forindividual words of the removed document, decrement the value of thenumber of documents in which the respective word is present. If thenumber of documents for a given word reaches zero at a given moment intime, then the word may be removed from the vocabulary list as not beingpresent in any of the documents being currently analyzed by theapparatus 10.

Processing circuitry 14 may implement processing including generation offorward and inverted indices for use in association of documents forvisualization generation in one embodiment. As described below, theindices may be periodically dynamically modified or recalculatedcorresponding to the dynamic addition and/or removal of documents fromthe visualization.

Referring to FIG. 4, an exemplary forward index 50 generated duringprocessing by processing circuitry 14 with respect to two documents isshown. To generate the forward index, the processing circuitry 14associates each document (identified by one of document sequence numbers42) with the word (and/or other feature) contents of the respectivedocument using the vocabulary list, a plurality of word (and/or otherfeature) numbers 52 and the words (and/or other feature) of therespective documents. The words of the documents are identified by theprocessing circuitry 14, the vocabulary list is updated and used tocreate the forward index. The individual associations of the documentsand word contents of the forward index are maintained during thepresence of the respective documents in the visualizations prior tobeing aged off. The documents 0, 1 identified by document sequencenumbers 42 and associated with the word numbers 52 may be referred to asprocessed documents and may be used to create associations of thedocuments for visualization, for example, using IN-SPIRE.

The forward index operates to associate or identify the words presentwithin the documents associated with sequence numbers 0, 1 in theexample of FIG. 4. Word numbers 52 from the vocabulary list are assignedto the words in the example of FIG. 4 for identification of the words.Rat is assigned word number 1 as shown in both documents 0, 1. Theforward index includes the sequence number 42 of the respectivedocuments 0, 1 and the respective word numbers 52 corresponding to thewords present within respective documents 0, 1 in one embodiment.Accordingly, as shown, the documents and word contents of the documentsare associated using the forward index via the document sequence numbers42 and word numbers 52 in one embodiment. As new documents are accessedby apparatus 10, the new documents may be processed and added to theexisting already processed documents of the forward index and used togenerate subsequent associations of documents for visualization. The newvisualizations may use both the previously processed documents and newlyprocessed documents to avoid or reduce duplicative processing orcomputations in one embodiment.

Referring to FIG. 5, an exemplary reversed or inverted index 60 is shownwhich may be calculated from the forward index 50. The reversed indexoperates to identify, for a given word, the documents in which the wordis present. The words are identified by word numbers 52 and thedocuments are identified by document sequence numbers 42 in theillustrative example. As shown in FIG. 5, the number of occurrences ofthe word in the respective document are indicated by the frequency 62(all words only occur once in documents 0, 1 in the example of FIG. 5).

In accordance with one dynamic embodiment, processing circuitry 14 isconfigured to update the visualization (e.g., FIG. 2) corresponding tothe documents present in the apparatus 10 at respective moments in time.The forward and reverse indices are used by processing circuitry 14during processing to generate the associations of the documentsincluding processing comprising generating topicalities, associationmatrices and/or document vectors in accordance with SPIRE or IN-SPIREvisualizations of some exemplary embodiments.

Processing circuitry 14 may update the visualization at a plurality ofincrements or intervals to include new documents and remove aged-offdocuments in one embodiment. Intervals may be defined in one embodimentby the reception of one or more new documents by apparatus 10. In otherembodiments, intervals may be defined differently, such as correspondingto a plurality of moments in time.

According to some embodiments, processing circuitry 14 may utilizeinformation where possible from previously processed documents to reducecomputations, processing time, etc. at new intervals. Updating thesequence numbers during the dynamic reception of new documents andaging-off of old documents facilitates the leveraging of previouslyperformed computations and the identification of specific documents atdifferent moments in time and corresponding to different intervals. Inaddition, the mapping of words (or features) and respective word (orfeature) numbers 52 may also be incrementally updated in a fashionsimilar to the embodiment described with respect to FIG. 3.

For example, at an individual interval (e.g., corresponding to thearrival of one or more new documents at a moment in time in thedescribed example), the processing circuitry 14 may update theassociation of documents and sequence numbers as described with respectto the exemplary embodiment of FIG. 3 and timestamp the new documentswhich are received by apparatus 10. Thereafter, the processing circuitry14 may identify documents which should be aged-off. In one embodiment,the processing circuitry 14 compares the timestamps of the documentswith respect to a threshold indicative of an amount of timecorresponding to the window of documents being processed. If the amountof time from the timestamp of a given document to the present timeexceeds the threshold, the document may be aged-off. In one embodiment,the visualizations do not depict aged-off documents or associations ofthe aged-off documents.

According to one embodiment, the processing circuitry 14 updates thevocabulary list and may use the forward index to identify the wordspresent in a document to be aged off. The counts of the individual wordspresent in the aged off document are decremented in the word vocabulary.If the count for a given word is dropped to zero as a result of thedecrementing, the word may be removed from the word vocabulary.

Thereafter, the counts of words of the new documents and present in thevocabulary list are incremented, or if a word appears for the firsttime, the word may be added to the vocabulary list with a count of one.

Next, the processing circuitry 14 may update an existing forward indexby removing aged-off documents and associating updated document sequencenumbers 42 with the word numbers 52 of the respective documents (e.g.,with respect to the example of FIG. 3 at the increment of time t4, theprocessing circuitry 14 removes documents F-G from the forward index andreassigns the associations of the word numbers 52 to the new sequencenumbers). Thereafter, the new documents are added to the forward indexwith the associations of the sequence numbers 42 and the respective wordnumbers 52 corresponding thereto for the new documents. According to oneembodiment, documents already present in the apparatus 10 at an intervalare not reprocessed for the forward index but instead the sequencenumbers are reassigned permitting the documents to be identified withoutthe computational cost and time for reprocessing such documents toidentify the words present in the already processed documents.

After the updating of the forward index, the inverted index may beentirely regenerated corresponding to the newly updated forward index,or differentially adjusted in a process similar to the describedadjustment of the forward index. After the generation of the forward andinverted indices, processing circuitry 14 may utilize the indices toperform the processing including associating the documents with oneanother. Indices assist with identification of the documents and wordsduring the processing to form the vectors, matrices, etc. In theexemplary embodiment wherein SPIRE or IN-SPIRE processing isimplemented, the processing circuitry may use the forward and invertedindices to perform topicality processing for identifying words usefulfor discrimination of the documents and forming clusters, calculateassociation matrices, calculate document vectors, and to generatevisualization files which may be used to form the visualizations upondisplay 20 corresponding to the respective moments in time. Theabove-exemplary processing may be repeated at each subsequent interval.In the described embodiment, documents received at different moments intime (via different increments) may be associated with one another priorto the documents being aged-off.

At least some aspects of the presently described embodiment reduce theprocessing performed by processing circuitry 14 to analyze a dynamicallychanging set of documents. For example, in one embodiment describedabove, the vocabulary list may be updated at a plurality of moments intime including adding and deleting words to the existing list as thedocuments dynamically change. In addition, forward indexes may bedynamically updated using the sequence numbers. For example, previouslyindexed documents are not reprocessed as described with respect to FIG.4 but merely updated to add new documents and remove aged documents. Newreverse indices may be created using the plural forward indicesfollowing the respective updates of the forward index in one embodiment.As described below, the vocabulary list and forward index aredynamically updated by merely adding and removing new and ageddocuments, respectively, as opposed to being entirely recalculated atthe different increments which conserves processing resources.

Referring to FIG. 6, an exemplary screen display 80 which may begenerated by display 20 in accordance with additional embodiments isshown. The screen display 80 is a histogram depicting a plurality ofvertical bars 82 corresponding to a plurality of x-axis locations 84which corresponds to the window of time of documents being analyzed.Indicia at the locations 84 may represent a description of the timeintervals being utilized. For example, indicia may show date/time ofpublication of the respective documents (e.g., hours of publication aredepicted in the illustrated example of FIG. 6). Other graphicalrepresentations may be used to depict the information shown in FIG. 6 inother embodiments.

The vertical bars 82 illustrate quantities of publications which werepublished at respective moments in time depending upon their publicationdate/time in the exemplary illustration and which are depicted using avisualization such as shown in screen display 30 of FIG. 2. Morespecifically, in the illustrated figure, vertical bars 82 are placed atx-axis locations 84 corresponding to a time of publication representedby hours of a day. As time progresses, the bars 82 move left across thescreen display 80. Individual bars 82 may also include representativeindicia identifying the quantity of documents represented by therespective bars 82 as shown in FIG. 6.

Time moves from moves from right to left in the embodiment of FIG. 6 andthe rightmost bar 82 represents publications which have been mostrecently published and the leftmost bar 82 indicates documents whichwere published the comparatively longest time ago. Overlaid on thisdepiction, information can be added about those documents recentlyreceived and those about to age off, for example, as described below inone embodiment.

One or more of the bars 82 or portions of the bars 82 may bedistinguished from other bars 82 to convey information to a user in oneembodiment. As shown in the example of FIG. 6, the middle bars 82 may bedepicted using a base color while other portions or entireties of otherbars 82 may be depicted using a different color or otherwisedistinguished for observation by a user. For example, a first alternatecolor 86 may be used to represent documents which were most recentlyreceived by apparatus 10 and a second alternate color 88 may be used torepresent documents which are next to be aged-off. As shown in FIG. 6,time of receipt may not be the same as publication time.

The date of reception of the documents within apparatus 10 may be usedto determine if an alternate color 86 or 88 is suitable in oneembodiment. The date/time of reception may be compared with a thresholdto determine if color 86 should be used to illustrate the document hasbeen recently received (e.g., the color 86 is used if the time betweenreception and the present time is less than the threshold). Color 88 maybe used if the time between date/time of reception and the present timeis greater than another threshold and to indicate imminent aging off ofthe documents. The thresholds may be selected corresponding to thewindow of documents being displayed in the visualization. Otherembodiments are possible for distinguishing bars 82, bar portions orother representations of quantities of documents. For example, documentsmay be in bars arranged according to date/time of receipt and/or otherdistinguishing colors 86, 88 may be used to convey information regardingdate/time of publication in other embodiments. Further, other graphicalformats may be used to illustrate quantities of documents in otherembodiments.

At least one embodiment enables a user to change a mode of operationfrom active to paused to pause updates to the visualization which isdepicted by display 20. During one embodiment of active mode ofoperation, apparatus 10 continually updates the visualization depictedby the display 20 to reflect associations of newly received documentsand aged-off documents. In one embodiment, the updating is automaticwithout user input and based upon the dynamic documents being processed.

According to one embodiment, during a paused mode of operation, thevisualization depicted when the paused mode of operation was entered isdisplayed and updates resulting from the arrival of new documents andaging off of old documents may be calculated but adjustments resultingtherefrom are not made to the visualization. Accordingly, in oneembodiment, the state of the visualization when pause was entered ismaintained until a user again desires active mode to be resumed. Thismay give the user an opportunity to further study the visualizationwithout changes to the contents of the documents.

As mentioned above, in one embodiment, processing circuitry 14 maycontinue to process new visualizations to account for new documents andaged-off documents (e.g., associate the new documents accessed duringthe paused mode of operation with the existing documents) although theresultant visualizations are not illustrated during the paused mode ofoperation. This facilitates resumption to the active mode of operationswhere apparatus 10 may add all increments to the visualization toprovide the user with a current state of the visualization upon a changeback to active mode of operation.

Referring to FIGS. 7A and 7B, a user interface 70, 70 a which may begenerated by display 20 is shown at different operational states ofapparatus 10. For example, FIG. 7A corresponds to live or active stateof operations of apparatus 10 wherein screen displays 30, 80 (FIGS. 2and 6) are dynamically updated as documents are received by apparatus10. Indicia 72 of FIG. 7A illustrates the active mode of operation. Aslider tab 76 is positioned at the leftmost location of the sliderindicating that increments of documents have been loaded into thevisualizations and the status is current.

A button 74 may be selected by the user to toggle the mode of operationfrom active of FIG. 7A to paused of FIG. 7B. Indicia 72 a of FIG. 7Bdepicts the status of “paused” wherein increments of documents are notdynamically updated upon screen display 30 or 80 (FIGS. 2 and 6). Thepaused mode of operation may be useful to a user who wishes to studyand/or interact with visualizations at a given moment in time. Indicia72 a and slider tab 76 illustrate a length of time which has passedsince a last increment of documents has been loaded into visualizationsof screen displays 30, 80 (FIGS. 2 and 6). In addition, a color of theinterface 70 a may be changed when an increment of new documents hasbeen received and not updated in the visualization. A user may depressbutton 74 a when desired to return to a dynamic mode of operation.

As mentioned, documents may be received by apparatus 10 during operationin a paused mode. In one embodiment, apparatus 10 may continue toprocess the documents even though the visualizations are not updated toreflect the presence of the new documents or aging off of staledocuments. For example, in one embodiment, the processing circuitry 14may update the vocabulary list, update the forward index, recalculatethe reverse index and perform other processing of newly receiveddocuments. The processed information may be used to create an up-to-datevisualization when the user unpauses the mode of operation of apparatus10. If the apparatus 10 has been paused for an extended period of time,a plurality of documents may have been received and processed at aplurality of increments. The processing circuitry 14 may roll allincrements forward to return the apparatus 10 to dynamic up-to-dateoperation and provide up-to-date visualizations when apparatus 10 isunpaused.

Other aspects of the disclosure implement synchronization operations toaccommodate pausing and dynamic modes of operation of apparatus 10. Forexample, a file of a visualization may be accessed and partiallyprocessed by processing circuitry 14 or otherwise unavailable (e.g.,responsive to user input) for short periods of time. Synchronization maypreclude dynamic updates of the active mode until the file is releasedby processing circuitry 14.

More specifically, in one embodiment, processing circuitry 14 may beconfigured to operate plural processes in parallel including a documentingest process and a visualization process. The ingest process isconfigured to access and process documents newly received by apparatus10 (e.g., calculate or update the vocabulary list, the forward andreverse indexes, topicalities, the association matrices, documentvectors and visualization files). Exemplary visualization files includedata to control display of association of documents (e.g., raster dataof clusters using screen display 30) and cluster labels corresponding tothe clusters of documents presently processed by apparatus 10.

The ingest process may indicate to the visualization process when theprocessed data is ready for access and display. During an un-paused,dynamic mode of operation of apparatus 10 and following the data readyindication from the ingest process, the visualization process may accessthe processed data (e.g., visualization files) and control the display20 to depict the respective visualizations responsive thereto. During apaused mode of operation, the ingest process may be configured tocontinue to process the incoming documents; however, the visualizationprocess may be configured to maintain the visualizations in the statewhen pausing occurred. Thereafter, when apparatus 10 is un-paused, thevisualization process may access the processed data and create thevisualizations to a current, up-to-date state in one embodiment.Processing circuitry 14 may coordinate and synchronize the transfer ofdata from the ingest process to the visualization process to avoiderrors (e.g., not accepting the processed data until files which mayhave been accessed by a user are cleared and available for updatingusing the newly processed data).

User interfaces 70, 70 a also depict a snapshot button 78 in oneembodiment. Snapshot button 78 may be used to cause apparatus 10 to savea view of the visualizations (e.g., FIG. 2) and associations ofdocuments at a given moment in time. A created snapshot may be used toregenerate the visualizations and associations of documents when thesnapshot was created at a subsequent moments in time. When snapshotbutton 78 is depressed by a user, processing circuitry 14 may store adataset of document sequence numbers, the word vocabulary, forward andreverse indices, topicalities, association matrices, vectors,visualization files, and other information which may be accessed andused by processing circuitry 14 to regenerate the visualization when thesnapshot was taken at subsequent moments in time.

In compliance with the statute, the invention has been described inlanguage more or less specific as to structural and methodical features.It is to be understood, however, that the invention is not limited tothe specific features shown and described, since the means hereindisclosed comprise preferred forms of putting the invention into effect.The invention is, therefore, claimed in any of its forms ormodifications within the proper scope of the appended claimsappropriately interpreted in accordance with the doctrine ofequivalents.

1-14. (canceled)
 15. A data visualization apparatus comprising: adisplay configured to depict a plurality of visual images; processingcircuitry coupled with the display and configured to access a pluralityof documents and to control the display to depict the images usinginformation of the documents; and wherein the processing circuitry isconfigured to control depiction of the images comprising informationregarding a plurality of quantities of documents received by the datavisualization apparatus at a plurality of different moments in time, andto control the depiction of at least a portion of at least one of thequantities of the documents distinguished from another quantity of thedocuments.
 16. The apparatus of claim 15 wherein the processingcircuitry is configured to control the depiction of the imagescomprising information regarding publication of the quantities ofdocuments at a plurality of different moments in time.
 17. The apparatusof claim 15 wherein the processing circuitry is configured to identifythe portion using time information of when the documents are received bythe data visualization apparatus.
 18. The apparatus of claim 17 whereinthe processing circuitry is configured to identify the portion ascorresponding to documents which were most recently received by the datavisualization apparatus compared with reception of documents of theother quantities of the documents.
 19. The apparatus of claim 17 whereinthe processing circuitry is configured to identify the portion ascorresponding to documents which have dates of reception greater than athreshold.
 20. The apparatus of claim 17 wherein the processingcircuitry is configured to identify the portion as corresponding todocuments which have dates of reception less than a threshold.
 21. Theapparatus of claim 15 wherein the different moments in time comprisemoments in time over a fixed length of time relative to present time.22. The apparatus of claim 21 wherein the processing circuitry isconfigured to remove quantities of documents from the images responsiveto the respective documents of the removed quantities of documentshaving a date of reception greater than a threshold relative to thepresent time. 23-42. (canceled)